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The distribution of most genes is not random, and functionally linked genes 
are often found in clusters. Several theories have been put forward to ex- 
plain the emergence and persistence of operons in bacteria |ffl|. Careful anal- 
ysis of genomic data favours the co-regulation model 13|, where gene or- 
ganization into operons is driven by the benefits of coordinated gene expres- 
sion and regulation. Direct evidence that co-expression increases the individ- 
ual's fitness enough to ensure operon formation and maintenance is, however, 
still lacking. Here, a previously described quantitative model of the network 
that controls the transcription factor a F during sporulation in Bacillus sub- 
tilis |ffl is employed to quantify the benefits arising from both organisation of 
the sporulation genes into the spoIIA operon and from translational coupling. 
The analysis shows that operon organization, together with translational cou- 
pling, is important because of the inherent stochastic nature of gene expres- 
sion which skews the ratios between protein concentrations in the absence of 
co-regulation. The predicted impact of different forms of gene regulation on 
fitness and survival agrees quantitatively with published sporulation efficien- 
cies. 
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standfirst text 

The benefits of co-regulated gene expression have been suggested to drive operon emergence 
and persistence but direct evidence that co-expression increases an individual's fitness is lack- 
ing. Here, a previously described quantitative model of the a F signaling network is employed 
to show that the inherent noise in gene expression can be sufficiently harmful that co-regulated 
expression can substantial increase survival chances. 

Main findings of the study 

• the study provides further support for the co-regulation model for operon formation 

• the study reveals that small variations in gene expression, as arise from the inherent 
stochasticity of biological processes, can be harmful, and that co-regulation of the ex- 
pression of interacting proteins by organization of the genes into operons can substantially 
increase survival chances 

• the quantification of the impact of co-regulation on an individual's fitness is possible for 
the first time because of the detailed mathematical model that we have developed recently 
for the genes encoded in the spoil A operon. 
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1 Introduction 



The available genome sequences demonstrate that many genes are clustered on chromosomes 
according to their function. Genes in bacteria are clustered but can also be organized into 
operons such that the expression of a group of genes is regulated by the same genetic control 
element. When operons were first discovered it was assumed that the benefit of co-transcription 
led to operon assembly 0. Other models have since been proposed, and these belong to one 
of three classes, the natal model, the Fisher model, or the selfish operon model 0. According 
to the natal model, clustering of genes is the consequence of gene duplication. However, since 
operons comprise genes that belong to very distant families and the majority of paralogs do not 
cluster, this model is insufficient to explain operon origin EE). A recast of the Fisher model, 
adapted to prokaryotes, proposes that clustering of genes reduces the likelihood that co-adapted 
genes become separated by recombination. However, this does not explain how operons can 
emerge, as recombination is as likely to generate clusters as to disrupt them. According to the 
selfish operon model, operons facilitate the horizontal transfer of functionally related genes 0. 
The physical proximity of genes thus does not necessarily provide a selective advantage to the 
individual organism but rather to the gene cluster itself, because it can be efficiently transmitted 
both horizontally as well as vertically. Recent studies have, however, failed to observe the 
gene cluster pattern predicted by the model, and this strongly suggests that the selfish operon 
model does not explain the emergence and persistence of operons EJEl. So what drives operon 
assembly? 

The idea that co-transcription of genes provides a selective advantage to the individual or- 
ganism has never been contradicted. It has been questioned only because it remains unclear 
whether the benefits of co-transcription could be strong enough to drive the assembly of oper- 
ons by rare recombination events 0[D- A genotype that confers higher fitness will dominate 
in a population with bounded total population size only if selection acts on a timescale that 
is substantially shorter than the timescale on which recombination and mutation events could 
negate the benefits. 

There are a number of potential selective advantages given by co-transcription. In the case of 
operons that code for multi -protein complexes, co-transcription enables co-translational folding 
0, it limits the half-life of toxic monomers [2], and it reduces stochastic differences in gene 
expression 0. Operons that do not code for interacting proteins may be advantageous because 
of the co-regulation of protein expression 0. Many examples of this class of operons are 
associated with metabolic operons where co-regulated expression is likely to optimize the 
flux and to facilitate the regulation of functions, especially if these are required only under 
certain environmental conditions, or if complex regulatory structures are employed 0. 

Evidence in favour of any of these proposed driving forces has so far largely been ob- 
tained from comparative genomics. Here we use a previously derived quantitative model for 
the network that controls the transcription factor a F during sporulation in Bacillus subtilis 
to quantify the benefits of co-expression. Spore formation in Bacillus subtilis is a response to 
nutrient deprivation at high cell density and involves asymmetric septation and compartment- 
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specific initiation of gene expression |9). The different gene programs in the larger mother cell 
and the smaller prespore are both directed by the transcription factor a F which, although only 
active in the smaller prespore, affects the transcriptional programs across the septum also in the 
mother cell, a phenomenon that is referred to as criss-cross regulation ifTOll . Successful sporu- 
lation therefore requires the rapid septation-dependent and prespore- specific activation of a F . 
a F is kept inactive by binding to SpoIIAB and is released upon binding of SpoIIAA (Fig. 1). 
SpoIIAA is phosphorylated by SpoIIAB ifTTI and reactivated by the serine phosphatase SpoIIE 
[12|. The balance between kinase and phosphatase activity thus determines whether or not a F 
is released from its inactive complex with SpoIIAB. SpoIIE accumulates on both sides of the 
asymmetrically positioned septum and therefore has an increased activity in the smaller com- 
partment [fT3l . A quantitative model of the regulatory network predicts that because of the low 
turn-over rate most SpoIIE is bound by its substrate such that enzyme and substrate increase 
together in the smaller compartment 0. According to the model, this combined increase is suf- 
ficient to trigger the formation of micromolar concentrations of a F holoenzyme in the prespore. 

It is obvious from the above that the protein concentration ratio is important. An excess of 
a F or SpoIIAA compared to SpoIIAB will result in free a F and <r F -dependent gene expres- 
sion while an excess of SpoIIAB will prevent SpoIIAA-dependent a F release. In the vegetative 
cell the sporulation proteins are not detectable, and septation is preceded by 90 — 120' of gene 
expression, dependent on the exact experimental conditions lfT4lfT5lfT6l . Limiting the stochas- 
tic noise inherent in protein expression can be expected to be crucial for avoiding variations 
in the relative protein concentrations and the resulting sporulation defects. Three of the four 
proteins in the network are transcribed from genes in the spoIIA operon (Fig. EJA). These genes 
are not only co-transcribed into a single mRNA but are also most likely to be co-expressed 
since the translation of the three proteins appears to be coupled, at least to some degree. This 
system therefore offers an excellent opportunity to analyse the influence of transcriptional and 
translational co-regulation of the sporulation genes on an individual's survival, fitness. 

Coupled translation is achieved when two genes are translated by the same ribosome. Reini- 
tiation of translation at a nearby start codon after termination at the upstream gene is possible 
because ribosome dissociation from the mRNA is a slow and energy-dependent process IfTTI . 
There is currently no direct experimental evidence for coupled translation of the spoIIA operon. 
Such coupling can, however, be postulated based on the arrangement of genes ifTHI . The first 
two genes in the spoIIA operon (encoding SpoIIAA and SpoIIAB) overlap by four basepairs, 
while the genes for SpoIIAB and o F are interspaced by 1 1 basepairs (Fig. |2j\); coupled trans- 
lation has been documented for intercistronic distances of more than 60 basepairs IfTTI . The 
majority of genes that are organized in operons are separated by distances comparable to those 
found in the spoIIA operon |[T9l . so that the studied system can be considered as representa- 
tive of operons in general. The efficiency of reinitiation depends on the distance as well as the 
strength of the Shine-Dalgarno sequence lfT71l20l which is, in general, located 5-13 basepairs 
upstream of a start codon and which binds to the homologous 3' end of the 16S rRNA, a compo- 
nent of the 30S ribosomal subunit. Moreover, the secondary structure of the mRNA can affect 
lateral diffusion of the ribosomes (|2"0"1. 
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According to the protein expression data for the spoil A operon it appears that the last gene 
in the operon, a F , is expressed at much lower levels than are SpoIIAA and SpoIIAB, while 
SpoIIAB monomers may be expressed at equal or up to 3-times higher levels compared to 
SpoIIAA llT4l 1131 n~6l . The weaker expression of a downstream gene (as is the case for a F ) can, 
in general, be accounted for by a weaker ribosomal binding site which is removed far enough 
from the termination codon of the upstream cistron that a considerable fraction of ribosomes 
dissociate from the mRNA before translation can be reinitiated [17|. It should be noted that 
while the transcriptional and translational coupling will reduce the noise in the relative SpoIIAB 
to a F expression levels the unbinding of ribosomes is necessarily a stochastic process and will 
therefore add a (low level) of noise. The stronger expression of a downstream gene (as may be 
the case for SpoIIAB relative to SpoIIAA) can, in general, only be observed if a strong initiation 
sequence for the downstream gene is occluded by mRNA secondary structure which is melted 
by the ribosome that transcribes the upstream gene ifTTI . Such a condition does not seem to be 
met by the gene for SpoIIAB, and more accurate expression data will be necessary to establish 
whether more SpoIIAB than SpoIIAA is expressed. 

Available expression data can best be captured by an expression rate for SpoIIAB dimers 
and SpoIIAA of 6 x lO^Ms" 1 and of 2 x lO^Ms" 1 for a F and SpoIIE gj; it should be 
noted that the simulation yields qualitatively similar results if SpoIIAB monomers and SpoIIAA 
are expressed at equal rates (6 x 10~ 9 Ms _1 ), as long as the o F and SpoIIE expression rate 
is then reduced to 10~ 9 Ms _1 ll2"Tll . As discussed in [ITU the linear increase in the protein 
concentration assumed here does not fully match the experimental observations. There are, 
nonetheless, two good reasons to use a linear model. First of all, the data is too inaccurate 
and, in parts, contradictory to be modeled exactly. Secondly, the chosen rates correspond to 
the protein concentrations measured at the time of septation lfbH[T51[T6l . the critical time point 
to judge sporulation success. This is because in the cell the HE concentration increases more 
slowly than the other protein concentrations and only increases sharply immediately before 
septation ll22l . As a consequence, the greatest danger of spontaneous uncompartmentalized 
activation of a F is just before septation, and this risk is fully assessed by the linear expression 
model. Since our analysis focuses mainly at what happens minutes before and after septation, 
individual fluctuations in the global expression rates during the 2 hours preceding septation are 
not important and the linear protein expression rates used should be considered as an averaged 
protein expression rate per bacterium. 

Our quantitative ordinary differential equation model is very detailed - it comprises 50 de- 
pendent variables and 150 kinetic constants to describe the dynamics of only four proteins; the 
reader is referred to a detailed discussion of the model in the Supplementary Material of 0. 
Given its high level of detail and accuracy the model predicts the phenotypes of essentially all 
mutants for which the biochemical effect is known. We can therefore expect that the predicted 
sporulation efficiencies in response to changes in parameter values are realistic. In the following 
we employ the model to quantify how far different levels of stochastic noise in gene expression, 
as modulated by different degrees of coupling of protein expression (that is by the coupling of 
both transcription and translation), affect the sporulation efficiency, that is the survival chances. 
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2 Results and Discussion 



In the following we address how variations in the protein expression rates affect the sporulation 
efficiency. Here we will look at the effect of parallel changes in all protein expression rates as 
well as at the effects of independent changes that skew the ratios of protein concentrations. As 
the standard,"wild-type" protein expression rates we use 6 x 10~ 9 Ms _1 for SpoIIAA and Spol- 
IAB dimers and 2 x 10 _9 Ms~ 1 for a F and SpoIIE [4J. After 120 minutes of protein expression 
the septum forms and SpoIIE accumulates on both sides of this septum. This is modeled by a 
four-fold increase in the concentration of SpoIIE, together with its associated substrate (phos- 
phorylated SpoIIAA) in the prespore. As before we define a successful sporulation event by the 
requirement that before septation the concentration of a F - RNA polymerase holoenzyme does 
not exceed 0.4 pM while after septation the concentration exceeds one micromolar |@]. 

If the protein expression rates are all varied in parallel, that is by a common factor as de- 
noted on the horizontal axis in Figure 133, we find that the predicted sporulation efficiency is 
not affected as long as a minimal expression rate is kept to provide sufficient a F for binding 
to the RNA polymerase (Fig. |2jB, grey lines). If the expression of SpoIIE is kept constant (in 
order to reflect that this protein is transcribed from a different locus and may therefore vary in- 
dependently) then an independent 2.5-fold increase in the other sporulation proteins can still be 
tolerated before the relative activity of the phosphatase becomes too weak (Fig. |2j3, black lines). 
An even higher independent increase in the expression of the spoIIA genes can be tolerated if 
we assume that the expression of the spoIIA and spoIIE genes is at least weakly correlated such 
that a large increase in the expression of the spoIIA genes is accompanied by a small increase in 
the expression of the spoIIE genes (Fig. Et). Such a correlation is not unexpected considering 
that variations in gene expression are the result of both intrinsic and extrinsic noise. The latter, 
which reflects cell-to-cell variation in the concentration of other molecular species such as the 
RNA polymerase, will affect all genes similarly. We can conclude that the independent regu- 
lation of the spoIIA and spoIIE genes is unlikely to generate a major risk of failed sporulation. 
Separation of the spoIIA and spoIIE genes on the bacterial chromosome, on the other hand, has 
benefits because it ensures that, upon septation, each compartment retains one copy of spoIIE 
while initially (for the first 10 — 15') two copies of spoIIA are in the mother cell but none in 
the prespore ll23l . This initial transient genetic imbalance may protect the mother cell from a 
relative increase of spoIIE to spoIIA gene products KH . 

If the expression levels of the genes in the spoIIA operon are varied independently of each 
other, the tolerance of the network to variations in gene expression drops substantially. In par- 
ticular, if SpoIIAB and SpoIIAA are no longer co-regulated, the network is sensitive to rather 
small changes (Fig. grey lines and circles). Thus if the SpoIIAA expression rate remains 
fixed and the SpoIIAB expression rate increases by 60% (corresponding to the factor 1.6 on the 
horizontal axis in Fig. EJ3), then sporulation is predicted to fail; 60% variation from the mean 
is a noise level observed in bacterial (E. coli) expression systems (24|. On the other hand, if 
SpoIIAA and SpoIIAB remain co-regulated but a F expression is regulated independently (Fig. 
|2t), black lines), the network is rather robust to variations in gene expression as long as the 
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expression of SpoIIAB is increased more than the expression of a F and the overall a F con- 
centration remains high enough to form micromolar concentrations of the holoenzyme. The 
transcriptional coupling together with a strong translational coupling of SpoIIAA and SpoIIAB 
therefore substantially increases the robustness of the network to fluctuations in gene expres- 
sion. Stochastic variations in the relative rate of a F translation, on the other hand, is not as 
detrimental as long as the translation efficiency for a F is lower than for SpoIIAA and SpoIIAB, 
as can be achieved by a weaker ribosomal binding site and the resulting (stochastic) dissociation 
of ribosomes. An advantage of preferential dissociation of the ribosomes before translating the 
gene for a F is that the bacterium saves the energy that would otherwise be required to translate, 
and subsequently degrade, unnecessary (harmful) copies of a F . Considering that a F comprises 
255 amino acids and linkage of each amino acid requires the equivalent of 4 ATPs the energy 
by not translating and degrading 10 /iM a F corresponds to more than 10 mM ATP, which is a 
considerable amount considering that the bacterial ATP concentration is 1-3 mM ll25l l26l 1271 
and sporulation is a response to starvation, that is energy deprivation. 

In a last step we can now quantify the impact of gene organisation on sporulation efficiency, 
and therefore fitness. For this we assume that the gene expression levels in the cell population 
follow a normal distribution with variance 7] around the mean value. Given the complex regula- 
tion pattern of gene expression, gene expression levels are unlikely to be distributed exactly nor- 
mally. A normal distribution is, however, still likely to provide an approximation no worse than 
what could be obtained with a detailed model of the regulatory process in the absence of suffi- 
cient data to determine all required parameter values [8|. Sporulation efficiency is determined 
as the fraction of simulation runs for which the concentration of a F '■ RNA polymerase holoen- 
zyme does not exceed 0.4 /iM before septation and exceeds one micromolar after septation [4|. 
For each condition the mean sporulation efficiency and standard deviation are calculated from 
100 independent runs that are carried out 10 times. In each run the protein expression rates were 
set randomly such that overall the respective distributions of the protein expression rates were 
obtained. Determination of the sporulation efficiency for r] £ [0, 1] shows that as long as the 
sporulation genes are translationally coupled, even high variances hardly affect the sporulation 
efficiency (Fig. 0\, black lines). The sporulation efficiency is even higher at high noise level, 77, 
if spoIIE expression co-varies with spoIIA expression, at least weakly (Fig. |3jB). A lengthening 
of the transcription time, (that is a delay in septation) when transcription levels are too low to 
generate sufficient a F until septation will further increase robustness to fluctuations in the rate 
of protein expression. Such a dependency of the time point of septation on the protein (and 
in particular the SpoIIE) concentration is in agreement with experiments ll28l [29l and might 
explain the large variance in the delay between the onset of sporulation and septation that is 
observed under different sporulation conditions. Low levels of additional stochastic noise in a F 
expression (broken lines), as may arise because of the stochastic dissociation of ribosomes, also 
has rather little impact and confirms that the weak coupling of SpoIIAB and a F translation does 
not substantially reduce sporulation efficiency. If on the other hand spoIIAB is removed from 
the operon and controlled independently by the same promotor then the sporulation efficiency 
drops rapidly (Fig. QK, blue lines). This is in good quantitative agreement with experiments 
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which find that the sporulation efficiency drops to 40 — 80% of wildtype levels ll30l . especially 
when considering that 77 ~ [0.3, 0.6] for these expression levels [24]. If spoIIAA is moved in- 
stead, then the effect is reduced (Joanna Clarkson, personal communication) as also predicted 
by the model (Fig. OK, grey lines). 

It should be noted that this drop in sporulation efficiency has previously been accounted 
for by the loss of the transient genetic imbalance when spoIIAB is moved to a chromosomal 
position close to the origin of replication ll30l . The transient lack of SpoIIAB expression in the 
prespore together with accelerated degradation of unbound SpoIIAB iPTl had been suggested to 
enable a F release [30|. However, we have shown previously that the transient genetic imbalance 
does not affect a F release on the timescale on which it persists ll2T1l . and stochastic effects are 
therefore a much more likely explanation for the observed phenotype of the mutants. 

We conclude from the analysis of this well studied model system that the protection from 
stochastic variation in the expression rate of interacting proteins can substantially increase vi- 
ability, and therefore constitutes a driving force for gene clustering and co-regulation. Whilst 
the importance of gene dosage had been recognized before P2l . and underexpression and over- 
expression of protein complex subunits in yeast had been shown to lower fitness J33J, this 
study reveals that much smaller variances, as can result from stochastic effects, can already 
have substantial detrimental effects. The detailed analysis of the expression of the sporulation 
proteins therefore demonstrates the optimized character of gene regulation and suggests that co- 
regulation of genes serves to optimize cellular network dynamics in spite of the inherent noise 
in all biological processes. 
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Figures 



Figure 1: An overview of the interactions in the network that controls a F activity in Bacil- 
lus subtilis. For details see text. The figure is a reproduction of Figure 1 in l4l . 

Figure 2: The impact of parallel and random variations in the expression of spoIIE and 
spoIIA genes on a F release. (A) The spoIIA operon comprises the genes for SpoIIAA, Spol- 
IAB, and a F . The genes for SpoIIAA and SpoIIAB overlap; the genes for SpoIIAB and a F are 
separated by 1 1 basepairs. (B) The regulatory network is robust to parallel variations in gene ex- 
pression. The predicted concentration of cr F -RNApolymerase holoenzyme before (dashed lines) 
and after septum formation (continuous lines) if either all (grey lines) or all protein expression 
rates except for the one of SpoIIE (black lines) were increased by the factor on the horizontal 
axis compared to the standard reference rates (6 x 10 -9 Ms~ l for SpoIIAA and SpoIIAB dimers 
and 2 x 10 _9 Ms~ 1 for a F and SpoIIE 0). (C,D) The expression rate combinations for which 
septation-dependent a F release is possible (between the lines) or not (outside the area marked 
by lines). (C) The impact of differential regulation of spoIIE and spoIIA expression. The ver- 
tical and horizontal axes indicate the fold variation in the spoIIE and spoIIA expression rates 
respectively, compared to the standard reference rates. (D) The impact of differential regulation 
of the expression of genes encoded in the spoIIA operon. The vertical axis indicates the fold 
variation in the expression of SpoIIAA (circles), a F (black lines), or SpoIIAA and o F (grey 
lines). The horizontal axis indicates the fold variation in the expression of SpoIIAB and of any 
other protein whose expression is coupled to the one of SpoIIAB (which are those genes in the 
spoIIA operon not reported on the vertical axis). The sudden jump observed at a high SpoI- 
IAB to a F ratio (lower black line) is the consequence of impaired a F release when the relative 
SpoIIAB concentration is too high. 

Figure 3: The impact of stochastic variation in gene expression on sporulation efficiency. 

(A) The fraction of successful sporulation events (as defined in the text) dependent on the vari- 
ance in gene expression if expression of the spoIIA genes is either coupled (black lines), the 
expression of SpoIIAB and a F is coupled (grey lines), or the expression of SpoIIAA and a F is 
coupled (blue lines). SpoIIE is expressed throughout at the standard rate of 2 x 10~ 9 M _1 s _1 . 
The broken lines show the effect of an additional independent normal variation in the rate of a F 
expression with rjs = 0.1 (dashed lines) or r]s = 0.3 (dotted lines) from the coupled rates. If 
a F is one of the coupled rates then a F expression is varied both together with its coupling part- 
ner and additionally independently to reflect the additive levels of noise acting at the initiation 
of translation and the re-initiation/dissociation step. (B) The fraction of successful sporulation 
events (as defined in the text) dependent on the variance in gene expression if expression of 
the spoIIA and spoIIE genes is coupled (to assess the benefits of correlated expression), and an 
additional noise term t)e is added to the expression of spoIIE with t)e — 0.1 (black continuous 
line), rj E = 0.3 (dotted line), or i] E = 0.6 (dashed line); t]e assesses the effects of independent 
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promotors and spatial heterogeneity in the concentration of transcription and translation factors. 
The red line is identical to the continuous black line in panel A (noise in coupled spoIIA ex- 
pression, SpoIIE expressed at 2 x 1CT 9 M _1 s _1 ). Mean and standard deviation are based on 10 
times 100 independent runs. 
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